{\it
    ABSTRACT---The problem of interest is a set of tasks (histories) that require a long subtask (trace) to be completed before a number of short subtasks (retraces).

    Aaron Bevill considered smaller problems with many retraces per trace. In that case retrace subtasks will dominate the computational burden. The retrace algorithm is GPU-acceleratable, so the retrace burden can be reduced significantly. Aaron implemented the retrace step with GPU acceleration, reducing the retrace time by 75\%.

    Mateusz Monterial considered MPI problems in which the domain must be decomposed. In that case, the trace logs must be shared between processes. The batch size (number of trace logs) should be large to minimize barriers, but must be small enough to fit into memory. Mateusz implemented a ring passing algorithm to avoid the large transfers and scalability problems associated with MPI\_GATHER\_ALL.
}
